Performance Improvement of Dysarthric Speech Recognition Using Context-Dependent Pronunciation Variation Modeling Based on Kullback-Leibler Distance
نویسندگان
چکیده
In this paper, we propose context-dependent pronunciation variation modeling based on the Kullback-Leibler (KL) distance for improving the performance of dysarthric automatic speech recognition (ASR). To this end, we construct a triphone confusion matrix based on KL distances between triphone models, and build a weighted finite state transducer (WFST) from the triphone confusion matrix. Then, dysarthric speech is recognized by a baseline ASR system. The corresponding phoneme sequence of the recognized sentence is then passed through the WFST to correct recognition errors. It is shown from dysarthric ASR experiments that average word error rate of an ASR system employing an error correction based on the proposed method is relatively reduced by 16.54% and 3.34%, compared to those of an ASR system without any error correction and using an error correction based on a conventional contextdependent phoneme confusion matrix, respectively.
منابع مشابه
Dysarthric Speech Recognition Based on Error-Correction in a Weighted Finite State Transducer Framework
In this paper, a dysarthric speech recognition error-correction method in a weighted finite state transducer (WFST) framework is proposed to improve the performance of dysarthric automatic speech recognition (ASR). To this end, pronunciation variation models are constructed from a context-dependent confusion matrix based on a weighted Kullback-Leibler (KL) distance between triphones. Then, a WF...
متن کاملDysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model
Dysarthria is a neuro-motor speech disorder that impedes the physical production of speech. Patients with dysarthria often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Current automatic speech recognition systems designed for the general public are ineffective for dysarthric sufferers due to the phonetic variation. In this paper, we investigate dysart...
متن کاملMultiple-Pronunciation Lexical Modeling Based on Phoneme Confusion Matrix for Dysarthric Speech Recognition
In this paper, we propose speaker-dependent multiple-pronunciation lexical modeling for improving the performance of dysarthric automatic speech recognition (ASR). For each dysarthric speaker, a phoneme confusion matrix is first constructed from the results of phoneme recognition. Then, pronunciation variation rules are extracted by investigating the phoneme confusion matrix, and they are incor...
متن کاملUsing Kullback-Leibler distance for performance evaluation of search designs
This paper considers the search problem, introduced by Srivastava cite{Sr}. This is a model discrimination problem. In the context of search linear models, discrimination ability of search designs has been studied by several researchers. Some criteria have been developed to measure this capability, however, they are restricted in a sense of being able to work for searching only one possibl...
متن کاملOn recognition of non-native speech using probabilistic lexical model
Despite various advances in automatic speech recognition (ASR) technology, recognition of speech uttered by non-native speakers is still a challenging problem. In this paper, we investigate the role of different factors such as type of lexical model and choice of acoustic units in recognition of speech uttered by non-native speakers. More precisely, we investigate the influence of the probabili...
متن کامل